Browsing Semi-structured Web Texts Using Formal Concept Analysis
نویسندگان
چکیده
Query-directed browsing of unstructured Web-texts using Formal Concept Analysis (FCA) confronts two problems. Firstly on-line Web-data is sometimes unstructured and any FCA-system must include additional mechanisms to structure input sources. Secondly many online collections are large and dynamic so a Web-robot must be used to automatically extract data. These issues are addressed in this paper. We report on the construction of a Web-based FCA system for browsing classified advertisements for real-estate properties. Real-estate advertisements were chosen because they are typical of semi-structured textual information sources accessible on the Web. Furthermore, the analysis of real-estate data using FCA is a classic example used in introductory courses on FCA. However, unlike the classic FCA real-estate example, whose input is a structure relational database, we automatically mine Web-based texts for their structure.
منابع مشابه
Browsing Semi-structured Texts on the Web using Formal Concept Analysis
Browsing unstructured Web-texts using Formal Concept Analysis (FCA) confronts two problems. Firstly, on-line Web-data is sometimes unstructured and any FCAsystem must include additional mechanisms to discover the structure of input sources. Secondly, many on-line collections are large and dynamic so a Web-robot must be used to automatically extract data when it is required. These issues are add...
متن کاملRecovering Structure from Unstructured Web-accessible Classified Advertisements
This paper describes a research prototype system called RFCA for structuring Web-accessible rental classified advertisements based on semantic content. A hand crafted parser is used to extract various facets of the rental property being advertised including amongst others; number of room, type of garage, dwelling type (unit, house, or high rise apartment), price and contact details. The perform...
متن کاملEfficient Text and Semi-structured Data Mining: Knowledge Discovery in the Cyberspace
This paper describes applications of the optimized pattern discovery framework to text and Web mining. In particular, we introduce a class of simple combinatorial patterns over texts such as proximity phrase association patterns and ordered and unordered tree patterns modeling unstructured texts and semi-structured data on the Web. Then, we consider the problem of finding the patterns that opti...
متن کاملMining Association Rules from Semi-Structured Data
Despite the growing popularity of semi-structured data such as Web documents, most knowledge discovery research has focused on databases containing well structured data. In this paper, we try to find useful information from semistructured data. In our approach, we begin by representing semi-structured data in a prototype-based approach. We then detect the most typical common structure of semist...
متن کاملUsing formal concept analysis with an incremental knowledge acquisition system for web document management
It is necessary to provide a method to store Web information effectively so it can be utilised as a future knowledge resource. A commonly adopted approach is to classify the retrieved information based on its content. A technique that has been found to be suitable for this purpose is Multiple Classification Ripple-Down Rules (MCRDR). The MCRDR system constructs a classification knowledge base o...
متن کامل